Feature Sets for the Automatic Detection of Prosodic Prominence
نویسندگان
چکیده
This work presents a series of experiments which explore the utility of various acoustic features in the classification of words as prosodically prominent or nonprominent. For this set of experiments, a 35,009 word subset of the Buckeye Speech Corpus was used [12]. This subset is divided across fifty-four segments of the Buckeye Speech Corpus. In a previous study, the words were transcribed for prosodic prominence by several teams, of sixteen naive native-speakers of English each, using the method Rapid Prosody Transcription developed in our prior work [10]. In the present study, we mapped the quasi-continuous valued prosody labels from the transcribed portion of the corpus to a binary prominence label. If at least one rater deemed a word prominent, it was labeled ‘prominent’ or otherwise it was labeled ‘nonprominent.’ 15,955 were labeled ‘prominent,’ yielding a baseline chance level of prominence assignment at 54.4%. 90% of the words were used in training the learning algorithms and the other 10% was used in testing. Several acoustic correlates are associated with prominence, including F0, duration, and intensity [1, 2, 4, 17, 3, 6, 15, 16, 11, 7, 14, 18]. The relative contribution that these play in speech recognition and in recognition by humans is well discussed in the literature [5, 18, 9, 13]. In the first set of experiments, Support Vector Machines (SVM) were used. SVMs were chosen because the task is a vector-input, class-label-output task, and SVMs do well at such tasks. Here, a set of 36 features was used, including both features known to be correlated to prominence and features not known to be correlated, such as the length of the pause after a word. The ten best-performing features were, in order, the minimum energy of the final vowel normalized by phones, the ratio of the energy of the current word to the next word, the post-word pause duration, the word duration normalized by phones, the maximum energy of the last vowel normalized by phone-class, the minimum value of f0 in the following word, the maximum energy of the stressed vowel normalized by phone-class, the stressed vowel duration normalized by phone-class, the minimum energy of the next word, and the maximum energy of the word. The classification accuracy was tested with related features clustered together into four groups: pause, duration, intensity, and pitch. The results are reported in table 1 For the second set of experiments, Hidden Markov Models (HMM) with three hidden
منابع مشابه
Word Prominence Detection using Robust yet Simple Prosodic Features
Automatic detection of word prominence can provide valuable information for downstream applications such as spoken language understanding. Prior work on automatic word prominence detection exploit a variety of lexical, syntactic, and prosodic features and model the task as a sequence labeling problem (independently or using context). While lexical and syntactic features are highly correlated wi...
متن کاملAutomatic prominence identification and prosodic typology
This paper presents a follow up of a study on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different prosodic features, pitch accent and stress, that are typically based on four acoustic parameters: fundamental frequency (F0) movements, overall syllable energy, syllable nuclei duration and mid-tohigh-frequency emphasis. A careful measurem...
متن کاملAutomatic detection of prosodic prominence in continuous speech
This paper presents work in progress on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different phonetic features: pitch accents, connected with fundamental frequency (F0) movements and syllable overall energy, and stress, which exhibits a strong correlation with syllable duration and high-frequency emphasis. By deriving a set of acoustic ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملProminence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection
A prominence model is proposed for enhancing prosodic features in automatic lexical stress and pitch accent detection. We make use of a loudness model and incorporate differential pitch values to improve conventional features. Experiments show that these new prosodic features can improve the detection of lexical stress and pitch accent by about 6%. We further employ a prominence model to take i...
متن کاملExtending AuToBI to prominence detection in European Portuguese
This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010